In this Issue:

The Rockley Report Current Issue Home Page

Best Practices

Managing a translation flow: best practices

Hélène Keufgens
Cogen sa
helene.keufgens@cogen.com

Managing global content combines content management and content translation processes. Each of these draws on different technologies and skills. Companies publishing multilingual information without internal translation skills find that they fare best by keeping both workflows apart, and creating an effective hand-over between them.

Content management encompasses a number of complex processes, such as the modeling of legacy content into manageable assets, the authoring of content with minimal re-creation, the storage and categorization of content for maximum searchability and reuse, and its delivery through the largest possible number of channels.

Content translation, on the other hand, involves leveraging a translation memory, maintaining its content objects and keeping them aligned across languages, and facilitating terminology searches. It also involves managing a large multicultural team of translators, in-country reviewers and content end-users, and resolving differences of opinion on subjective linguistic matters.

Companies that do not have an internal translation team or a numerous translation management staff, can optimize their relations with a specialized language service provider.

Steps to an efficient global content management process

Structuring content for maximum reuse, and for best translation results

Reuse of content through content management can be optimized by a unified content strategy [1]. This content strategy implies distilling the content produced by a company (or a department) to one occurrence of any given concept, in one content segment, and attaching to that segment the categorization tags that will make sure it can be found and reused. ("If you have it, but can't find it, you don't have it!")

The segmentation ("chunking") of the content into coherent units, and the definition of the content granularity are critical issues in that respect. The right balance between small and large units produces the best reuse rates and translation results.

Small content units improve reuse rates, but leave authors and translators at a total loss with ambiguous sentences such as "Replace it." or "Empty wastebasket.", which give no indication about the nature (nor the gender!) of the "it" object in the first phrase, nor of the function of "empty" (verb or adjective) in the second.

Large content segments, on the other hand, provide plenty of context, but hamper reuse.

The best results are obtained with "self-contained content units", i.e. units that can be understood on their own, without the help of preceding or subsequent segments.

Hand-over between content management and translation workflows

Once a content strategy is in place, and legacy content is captured and categorized in the content management system's repository, authors can start producing new "documents" by assembling existing content units, creating new ones, or updating previous ones.

In a global content management system, the release of new or updated content units should trigger the hand-over from the content management workflow to the translation workflow, in the form of:

  • an export of that content to an XML file
  • an upload of that file to the appropriate server
  • a notification to the language service provider that content is ready for translation

What should happen in the translation process?

Analysis of XML data

Although the actual schema or doctype does not impact the translation process, it is advisable to agree upon a schema structure with the language service provider and to use it consistently. With a common schema in place, the translatable content can be automatically located in the hand-over.

Extracting content from format

Content is extracted from the XML (tags are filtered out) in order to:

  • eliminate non-translatable content segments (reduces the translation cost)
  • obtain clean and untagged content (for maximum leverage of translation memory)
  • obtain clean documents (facilitate ease-of-use for contributors)

If you are creating content in multiple formats (e.g., Quark, Word, HTML, FrameMaker, XML, etc.) it can be difficult to effectively filter out the formatting tags and clean up the content enough to produce satisfactory reuse rates between content coming from various formats. This reduces the effectiveness of your translation memory in identifying reuse.

Companies planning on (temporarily) leaving some content (say, marketing collaterals) out of the content management system, may still want to leverage that content (produced in traditional desktop publishing formats) to reduce their translation costs. For them, the cross-platform performance of the filtering tool is an important parameter.

Figure 1 shows an example of a tag-free document for translation and review, with source content in one column and "fuzzy matching" translation, if any, in the other



Figure 1: An example of tag free translation and review of a document with full and fuzzy matches



Leveraging of translation memory

New or updated content created within a content management system should be cross-referenced against the translation memory.

During this comparison:

  • Updated content produces fuzzy matches which a translation memory is built to identify
  • Content which is new in the content management system can produce full reuses or fuzzy matches when compared to traditionally-produced materials, provided the language service provider uses cross-platform tools

Translation, followed by in-country review

Handing over self-contained content units to the language service provider in an XML-based automated workflow gives translators and in-country reviewers enough context to work with and allow them to work offline, using the productivity tools they are familiar with.

The language service provider should give them online access to the translation memory to do full- text and in-context terminology searches.

Providing a copy of the source document in it's final form gives translators and reviewers the context of illustrations and figure information.

The review stage involves discussions between all contributors, especially between in-country translators and reviewers. You and your language service provider need to encourage transparent contact among team members. A language service provider who has experience in multi-cultural team management is a definite asset.

Terminology validation

This should be an automated process to complement the terminology search tools the language service provider supplies to the translators and reviewers during the project.

This validation can also verify other standards to be adhered to, such as parts numbers consistency and typographical rules.

Re-injection of content into the original XML file

Translated content can be automatically added back into the XML file if unique content IDs as well as other indicators such as positioning markup are used in the document. For example, if the source language component has an ID of 123, and the translated content component has an ID of 123 and associated language metadata (e.g., French) the translated content component can be put into the document in the same location as the original source.

Storage of updated content in the translation memory and in the CMS

Translated and reviewed content can be stored in the content management system (for reuse in the authoring process) as well as in the language service provider's translation memory (for comparison of updated and new content). In more complex systems, the translations only remain in the translation memory at the language service provider's. In that case, the translation memory is accessed during authoring, and for rendering the final document with an online service.

All language versions of the same element in the CMS should have the same DOC ID, but each one should be tagged with the appropriate <language ID>, so that importing new translated content can just be a part of the normal flow of the document in the workflow.

Managing the cost of translation memory licenses or access

Most companies do not want to carry the high cost of purchasing a translation memory tool and its frequent upgrades, nor the expense of accessing a hosted translation memory. The language service provider should be responsible for using and maintaining a translation memory tool, either commercial-off-the-shelf or proprietary.

Language service providers should be able to give online access to their translation memory to all contributors to a language project, including the customer, for full-text, in context terminology searches.

Language service providers should clearly acknowledge that the translation memory is the customer's intellectual property, deliverable at any time in the TMX industry standard, which guarantees total interoperability. This gives the client complete control over their translation memory and allows them to move between language service providers if desired.

Focusing on core competencies

Effectively creating, managing and delivering content in the right format and at the right time to meet customers needs is a large task. Add to that task the management of all the multilingual content and the task can be enormous. The most successful solution focusses on core competencies.

The organization can focus on core competencies such as: technical and creative writers creating highly effective and easily translatable content, graphic artists designing the presentation, information architects determining the most effective ways of accessing the content, and marketing people defining the distribution channels of the content and everyone on the content life cycle team ensuring that the content has been optimized for every customer touch point.

A language service provider can focus on their core competencies such as: maintenance of translation memories, the alignment of translation memory segments in multiple languages, the creation of multilingual terminology search tools, and the resolution of language differences between contributors.

Together they can perfect a seamless hand-over of content from the content management system workflow to the translation workflow.

Summary

Content management systems and translation memory systems use different technologies and skills, and serve different purposes.

Both worlds have their finest specialists, and their best of breed systems and vendors, which are not necessarily combined in one package.

"A typical scenario is that an organization contracts with an outside agency to do localization. (...) [2]

Localizers are not an additional author.. Rather, they are an acquisition and syndication partner. When you give them content, you are essentially syndicating content to them. They set the form of the content and you produce it. And when they are finished localizing, they become an acquisition source, passing the content back into your system as efficiently as possible. Or in old-style computer terms, you export content to them and then import it when they have finished.

References

[1] "Managing Enterprise Content: A Unified Content Strategy" by Ann Rockley with Pamela Kostur and Steve Manning (ISBN 0-7357-1306-5), New Riders, 2002

[2] "Content Management Bible, 2nd Edition" by Bob Boiko (ISBN 0-7645-7371-3), Wiley Publishing, 2005

Copyright 2005, The Rockley Group, Inc.